Research 

Genome-wide and parental allele-specific analysis 
of CTCF and cohesin DNA binding in mouse 
brain reveals a tissue-specific binding pattern 
and an association with imprinted differentially 
methylated regions 
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Samuele M. Amante, Reiner Schulz, and Rebecca J. Oakey 2 
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DNA binding factors are essential for regulating gene expression. CTCF and cohesin are DNA binding factors with central 
roles in chromatin organization and gene expression. We determined the sites of CTCF and cohesin binding to DNA in 
mouse brain, genome wide and in an allele-specific manner with high read-depth ChlP-seq. By comparing our results with 
existing data for mouse liver and embryonic stem (ES) cells, we investigated the tissue specificity of CTCF binding sites. ES 
cells have fewer unique CTCF binding sites occupied than liver and brain, consistent with a ground-state pattern of CTCF 
binding that is elaborated during differentiation. CTCF binding sites without the canonical consensus motif were highly 
tissue specific. In brain, a third of CTCF and cohesin binding sites coincide, consistent with the potential for many 
interactions between cohesin and CTCF but also many instances of independent action. In the context of genomic im- 
printing, CTCF and /or cohesin bind to a majority but not all differentially methylated regions, with preferential binding 
to the unmethylated parental allele. Whether the parental allele-specific methylation was established in the parental 
germlines or post-fertilization in the embryo is not a determinant in CTCF or cohesin binding. These findings link CTCF 
and cohesin with the control regions of a subset of imprinted genes, supporting the notion that imprinting control is 
mechanistically diverse. 

[Supplemental material is available for this article.] 



DNA sequences that control transcription are frequently located in 
the noncoding portion of the mammalian genome (The ENCODE 
Project Consortium 2012). These elements can act over long dis- 
tances (Noonan and McCallion 2010). The identification of these 
control elements is important for elucidating human genetic dis- 
ease since genome-wide association studies regularly point to 
noncoding regions as candidates in disease etiology (Manolio 
2010). One of the proteins that contributes to the regulation of 
gene expression across the genome is CTCF (CCCTC-binding fac- 
tor), a protein with 11 zinc fingers (Filippova et al. 1996) and mul- 
tiple regulatory functions (Ohlsson 2001; Gaszner and Felsenfeld 
2006). CTCF can act as an insulator by blocking interactions be- 
tween enhancers and promoters (Bell et al. 1999), it can directly 
regulate chromosomal interactions (Yusufzai and Felsenfeld 2004; 
Hadjur et al. 2009), and it can act as an enhancer of transcription 
(Kuzmin et al. 2005). CTCF binds regions of DNA with high se- 
quence specificity and is sensitive to DNA methylation, having 
a lower binding affinity for methylated DNA (Mukhopadhyay et al. 
2004). The canonical consensus binding motif of CTCF and the sites 
of CTCF binding are evolutionarily conserved between mammals 
and birds (Martin et al. 2011; Schmidt et al. 2012). In vitro assays 
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have shown that CTCF can use different combinations of its zinc 
fingers to bind to distinct DNA sequences (Filippova et al. 1996). 
CTCF interacts with a variety of other factors. In particular, the 
cohesin complex, best known for its role in mediating sister- 
chromatid cohesion during cell division, has been found to fre- 
quently colocalize with CTCF during interphase (Parelho et al. 
2008; Rubio et al. 2008; Wendt et al. 2008; Xiao et al. 2011) with 
consequences for gene expression. At specific loci, cohesin is re- 
quired for cell-type-specific long-range chromosomal interactions 
in cis during cellular differentiation (Hadjur et al. 2009). 

Genomic imprinting refers to the parental allele-specific 
transcription of a subset of genes in mammals and flowering plants 
(Reik and Walter 2001; da Rocha et al. 2008). Roughly 140 tran- 
scripts are known to be imprinted in mammals (Schulz et al. 2008). 
Imprinting is controlled by epigenetic modifications that differ 
between the two parental genomes, including differences in DNA 
methylation (Li et al. 1993). Imprinted genes can occur in large, 
coordinately regulated clusters exemplified by the Gnas locus 
(Peters and Williamson 2007); they can form small domains such 
as the Mcts2/H13 locus (Wood et al. 2008) that are comprised of 
only two genes (McCole and Oakey 2008), or they can be single- 
tons like Impact (Hagiwara et al. 1997). In all cases, their parental 
allele-specific expression is ultimately due to an imprinting control 
region (ICR), a region of DNA that is differentially methylated be- 
tween the parental alleles. The parental allele-specific methylation 
of a differentially methylated region (DMR) is in most cases the 
consequence of the sex-specific epigenetic reprogramming of the 
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parental germ cells (Edwards and Ferguson-Smith 2007; Bartolomei 
2009). In addition, DMRs are actively protected from post-fertilization 
epigenetic reprogramming (Quenneville et al. 2012) and, thus, persist 
into adulthood. In some cases, the parental allele-specific meth- 
ylation of a DMR is set up post-fertilization during early embryo- 
genesis (somatic DMRs) (Kobayashi et al. 2012). DMRs with a direct 
germline origin are referred to as germline DMRs (gDMRs). Dis- 
ruption of imprinted gene expression after deletion of a DMR is 
considered evidence that the latter functions as an ICR. There are 
22 well-established gDMRs in mouse, of which 19 are maternally 
methylated and three are paternally methylated. Many mecha- 
nisms exist to "translate" allele-specific methylation into differ- 
ential gene expression, including differential protein binding 
(Lewis and Reik 2006). 

CTCF has long been associated with genomic imprinting due 
to its selective binding of the unmethylated maternal allele of the 
Igf2/H19 ICR resulting in parent-of-origin-specific expression of 
Igf2 and H19 in mouse and human (Bell and Felsenfeld 2000; Hark 
et al. 2000; Kanduri et al. 2000; Fedoriw et al. 2004; Szabo et al. 
2004). CTCF has been studied at several other imprinted loci, and it 
binds the unmethylated allele at the gDMRs of Rasgrfl, Pegl3, 
Kcnqlotl (Yoon et al. 2005; Fitzpatrick et al. 2007; Singh et al. 
2011), and GrblO (Hikichi et al. 2003; Mukhopadhyay et al. 2004). 
CTCF-mediated regulation is postulated to be one of two major 
control mechanisms operating at ICRs (Lewis and Reik 2006; Kim 
et al. 2009). Cohesin also has been linked to imprinting through 
its association with CTCF at the H19/Igf2 and Kcnqlotl DMRs 
(Stedman et al. 2008; Lin et al. 2011), and a role for cohesin in the 
allele-specific organization of higher-order chromatin has been 
proposed (Nativio et al. 2009). Here we present the first compre- 
hensive analysis of allele-specific CTCF and cohesin binding at all 
known DMRs in a single tissue, providing an unbiased assessment 
of the extent to which CTCF and cohesin are involved in im- 
printing control. 

Genome-wide ChlP-seq in mouse ES cells (Chen et al. 2008; 
Kagey et al. 2010) and human cells (Kim et al. 2007b) has shown 
that CTCF and cohesin bind tens of thousands of discrete sites 
across the genome, and CTCF binding is enriched in and near 
genes, consistent with a role in the control of gene expression. 
Mouse embryonic stem (ES) cell data identify CTCF and cohesin 
binding at the gDMRs of Pegl3, Zim2 (Peg3), PeglO, GrblO, and 
Mest but not at the H19/Igf2 ICR, even though CTCF is known to be 
important for imprinting regulation at this domain. Imprinting is 
dispensable in ES cells, where loss of imprinting frequently occurs 
without affecting viability in culture (Kim et al. 2007a; Rugg-Gunn 
et al. 2007; Frost et al. 2011). The same is true for Dnmtr'~, 
DnmtScC 1 ' , Dnmt3b~ l ~ triple knockout mouse ES cells that con- 
sequently lack all DNA methylation imprints but yet are viable 
(Tsumura et al. 2006). In contrast, a differentiated tissue where im- 
printing plays an important role is the brain (Davies et al. 2007), and 
this is supported by multiple lines of evidence. Firstly, the human 
imprinting disorders Prader-Willi syndrome and Angelman syn- 
drome present with behavioral and neurodevelopmental pheno- 
types (Cassidy et al. 2000; Lossie et al. 2001; Williams et al. 2006); 
secondly, of the —140 imprinted gene transcripts in the mouse, 
more than 50 are expressed in brain (Wilkins 2008); thirdly, the 
disruption of certain mouse imprinted genes, including Peg3 (Li 
et al. 1999), Mest (Lefebvre et al. 1998), Nesp55 (Plagge et al. 2005), 
and GrblO (Garfield et al. 2011), results in behavioral phenotypes; 
finally, genome-wide allele-specific studies of transcription in 
mouse brain suggest that this tissue is a focus for imprinted gene 
expression (Gregg et al. 2010a,b; DeVeale et al. 2012). 



Our analyses of CTCF and cohesin binding in mouse brain are 
based on ChlP-seq data of high quality and an order of magnitude 
higher read depth than existing data. The use of reciprocal inter- 
subspecies hybrid mice enabled independent interrogation of the 
parental alleles in terms of CTCF and cohesin binding in un- 
precedented detail. We examined postnatal day 21 (P21) mouse 
brain, a time point in development shortly after the growth spurt 
in neurogenesis that occurs in the first 2 wk of postnatal devel- 
opment (Lyck et al. 2007). In the adult mouse brain, —56% of cells 
are neurons and 44% are nonneuronal cells (Fu et al. 2012). Neu- 
rons and the principle type of nonneuronal cells, the macroglia, 
both derive from the neuroepithelium. These data are representa- 
tive of adult rather than immature brain cell types and are un- 
affected by long-term aging effects. 

Results 

We demonstrate that in mouse brain, CTCF and cohesin each bind 
to —50,000 sites in the genome, with —27,000 sites bound by both 
factors, indicative of CTCF and cohesin acting throughout the 
genome both in concert as well as independently. Genes are 
enriched for CTCF binding sites, while intergenic regions are de- 
pleted. The binding sites are highly enriched for the canonical 
consensus binding motif. CTCF binding sites are relatively hypo- 
methylated, both in the CpG and non-CpG sequence context. 
Parental allele-specific CTCF binding is rare, with most sites at or 
near imprinted loci. However, a majority but not all DMRs are 
bound by CTCF (or cohesin), and the binding is not necessarily 
allele specific. The Magel2/Pegl2 imprinted locus is unique in the 
genome, comprising a cluster of eight allele-specific CTCF binding 
sites. Comprehensive expression profiling in mouse brain of genes 
near allele-specific CTCF binding sites not previously associated 
with imprinting did not reveal novel imprinted genes. No allele- 
specific cohesin binding sites of genome-wide significance were 
found, although at allele-specific CTCF binding sites, there is 
a trend for cohesin to bind the same allele. 

Deep ChlP-seq for CTCF and cohesin to detect parental 
allele-specific binding 

Sites of CTCF and cohesin binding to DNA were determined ge- 
nome wide in whole P21 mouse brain by chromatin immuno- 
precipitation (ChIP) using antibodies specific to CTCF and the 
RAD21 cohesin subunit followed by high-throughput sequencing 
(ChlP-seq). The mice were the offspring of crosses between C57BL/6 
(B16) females and Mus musculus castaneus (cast) males (B X C), and 
vice versa (C X B) (Fig. 1A). We generated 235 million and 231 
million high-quality and uniquely mapping sequence reads for 
CTCF and cohesin, respectively (Fig. 1A). The percentage of reads 
representing clonal duplication was below 6.2% for all samples 
(Supplemental Fig. SI). Duplicate reads were excluded from further 
analysis, and regions of CTCF and RAD21 binding were identified 
using USeq and assigned to either the B16 or cast genome based on 
known SNPs (Fig. IB; Supplemental Fig. S2; Supplemental Table SI; 
Keane et al. 2011; Yalcin et al. 2011). 

A systematic read mapping bias toward the reference B16 ge- 
nome was observed, consistent with a B16 allele read in a poly- 
morphic region being more likely to align for both CTCF and 
cohesin. However, our use of reciprocal crosses prevented parental 
allele-specific binding being confounded: There was no overall bias 
toward either of the parental alleles when the reads generated from 
both reciprocal crosses were considered together (Fig. 1C,D). 
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Figure 1. (A) ChlP-seq was performed for CTCF and cohesin (RAD21) 
on P21 brain in B x C and C x B Fi hybrid animals. The experimental 
design and number of uniquely mapped reads taken forward for further 
analysis are shown. (B) Regions of CTCF and RAD21 binding were iden- 
tified using the Useq, and regions identified with a FDR of <1 3 were 
considered significant and were tested for parent-of-origin-specific bind- 
ing. (Black bar) The number of reads for each experiment that fell at, or 
within ±500 bp of a binding region; (white bar) indicates reads in bind- 
ing regions that aligned over a SNP between C57BL/6 (BI6) and Mus 
m. castaneus (cast); (gray bar) the number of reads after the paired reads 
are considered together and the best-quality read is used to map the read 
to BI6 or cast. (Hatched bar) The final number of reads assigned. There was 
a consistent bias toward the reference sequence (C); however, this effect 
was eliminated after we combined BxC and CxB reads (D). 



All three motifs display a high degree of sequence homology, par- 
ticularly at the core 12-bp sequence at the center of the identified 
motifs (Fig. 3A). 

CTCF binding sites are hypomethylated 

The preference of CTCF to bind unmethylated DNA was confirmed 
by assessing the level of cytosine methylation at CTCF binding 
sites in brain. Using genome-wide bisulfite-sequencing (BS-seq) 
data for adult mouse brain (Xie et al. 2012), we compared the 
overall genome-wide level of methylation at cytosine residues, 
separately for CpG dinucleotides and non-CpG cytosines, with the 
portion of the genome corresponding to regions of CTCF binding. 
We found that methylation at CpG dinucleotides appears to have 
a greater influence on CTCF binding than non-CpG methylation. 
Genome wide, 60.8% of CpGs are methylated in the mouse brain, 
in contrast to 51.9% of CpGs in regions of CTCF binding. Non- 
CpG methylation also is less frequent in regions of CTCF binding 
(2.1%) compared with the genome-wide level (2.5%) (Fig. 3B). 
These differences are statistically significant (x 2 test, P < 1 X 10~ 6 ). 

CTCF preferentially binds near genes 

We explored the genome-wide location of both CTCF binding re- 
gions and parent-of-origin-specific CTCF binding regions using 
the ds-regulatory element annotation (CEAS) tool (Shin et al. 
2009). CTCF binding is particularly enriched in regions up to ±3 kb 
upstream of and downstream from genes, but is depleted in inter- 
genic regions (Fig. 3C). This is consistent with the insulator function 
of CTCF and, more generally, its involvement in controlling gene 
expression. When we limited our analysis to parent-of-origin-specific 
CTCF binding sites, we found the results to be similar. However, 
intronic regions appeared to be slightly underrepresented and 
intergenic regions slightly overrepresented relative to the distri- 
bution of all CTCF binding sites (Fig. 3C). Given the small number 
of parent-of-origin-specific CTCF binding sites, these differences 
are likely due to chance. 



CTCF and cohesin binding in mouse brain 

Genome wide, we detected 49,358 CTCF and 52,938 cohesin 
binding sites with a high degree of statistical confidence. Of these, 
27,241 sites were bound by both CTCF and cohesin, accounting 
for 55.3% of the CTCF and 51.5% of the cohesin binding sites, 
respectively (Fig. 2). This is consistent with previous studies that 
show both independent and coordinated roles for these factors 
(Wendt et al. 2008; Lin et al. 2011). 



Noncanonical CTCF binding sites are tissue specific 

We compared the locations of CTCF binding sites in P21 brain with 
those reported for mouse ES cells and liver (Chen et al. 2008; 



CTCF 



cohesin 




CTCF binds to regions containing the canonical 
consensus motif 

CTCF binds to a specific DNA sequence motif in ES cells (Chen 
et al. 2008) and liver (Schmidt et al. 2012). To search for CTCF 
binding motifs in brain, we applied the MEME de novo motif- 
finding tool to the sequences of all CTCF binding regions in P21 
mouse brain (Fig. 3 A). The most significant motif (P = 2.9 X 10~ 199 ) 
is highly similar to the published CTCF binding motif (Chen et al. 
2008; Schmidt et al. 2010, 2012). To ensure the consistency of the 
comparison, we repeated the MEME analysis using identical pa- 
rameters on CTCF binding regions previously identified using 
ChlP-seq in ES cells and liver (Chen et al. 2008; Schmidt et al. 2012). 
Again, the canonical motif was identified as the most significant 
motifinbothEScells(P=7.4x 10" 924 ) and liver (P = 1 .4 X 10" 367 ). 




Figure 2. Overlap of the 49,358 CTCF and 52,938 cohesin binding 
regions in mouse brain. This demonstrates that just over half of CTCF 
(55%) and cohesin (51%) binding sites are shared, suggesting both in- 
dependent and combinatorial functions for CTCF and cohesin in the 3-wk 
mouse brain. 
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Figure 3. CTCF binding analysis. (A) MEME motif finder was executed 
on the CTCF binding locations identified by ChlP-seq in brain and com- 
pared with motifs identified using previously published ES cells and liver 
binding locations. Each data set found the canonical motif with high de- 
grees of certainty. (B) The level of cytosine methylation within CTCF 
binding sites in the brain was compared with that across the whole ge- 
nome using data from Xie et al. (201 2). In both CpG and non-CpG con- 
text cytosine methylation, cytosines within CTCF binding sites are 
methylated less than those outside CTCF binding sites (x 2 contingency 
table tests, P < 0.001 for CpG and non-CpG context), confirming that 
CTCF prefers to bind to unmethylated DNA. (C) Genomic locations of 
CTCF binding are normalized to the proportion of the genome that 
constitutes each location (represented by the red line). This was consid- 
ered for all CTCF peaks called with an FDR < 1 3 and separately for the 1 1 6 
regions where CTCF binding was seen on one parental allele only (regions 
identified with a P< 0.001 ). CTCF is significantly enriched at genie regions, 
but depleted in distal intergenic regions. Parent-of-origin-specific CTCF 
binding locations are similar but show that binding is depleted in introns 
but not in intergenic regions. 



Schmidt et al. 2012). The incidence of overlap between sites 
reported in different studies increases with increasing the peak size 
used for the comparison. Beyond a certain peak size, increases in 
overlap are mostly due to chance. Therefore, we iteratively in- 
creased peak size and compared the incidence of overlap between 
sites in ES cells, liver, and brain with randomized site locations. 
Beyond a peak size of 1 kb, increases in the incidence of overlaps 
were likely due to chance (Supplemental Fig. S3). For a common 
peak size of 1 kb, 32.0% of all binding sites were shared between ES 



cells, brain, and liver, suggesting that they are invariant during 
differentiation and regardless of cell type (Fig. 4A). Only 1893 
binding sites were occupied exclusively in ES cells (5.1% of ES-cell 
binding sites), suggesting that most CTCF binding sites in ES cells 
represent a ground state that is added to during differentiation, 
with few binding sites being characteristic of pluripotency per se. 
In differentiated tissues, 29. 1% of brain and 3 1 .2% of liver binding 
sites are unique to the respective tissue. These analyses were re- 
peated using alternative CTCF ChlP-seq data from ES cells and liver 
(Shen et al. 2012), producing similar results, even though signifi- 
cantly fewer binding sites were identified in these studies because 
of limited read depth and quality (Supplemental Fig. S4). We hy- 
pothesize that the canonical consensus CTCF binding motif may 
be at the core of binding sites that are largely invariant with respect 
to cell type, concordant with other findings (Essien et al. 2009). We 
restricted the above overlap analysis to CTCF binding sites that 
lack the canonical binding motif. There was a large reduction in 
the number of binding sites shared between tissue types (Fig. 4A), 
with most binding sites now being tissue specific: 84.2% of bind- 
ing sites in brain that lack the canonical motif were brain specific, 
and similarly for ES cells (81.2%) and liver (82.9%) (Fig. 4B). These 
results suggest that CTCF binding to tissue-specific sites may involve 
other consensus motifs recognized by cofactors or tissue-specific 
conformations of the 1 1 zinc finger domains of CTCF itself. 

Parent-of-origin-specific CTCF and /or cohesin binding 
is limited to specific DMRs 

We systematically investigated the binding of CTCF and cohesin at 
or near 22 known well-characterized mouse gDMRs (Table 1) as- 
sociated with imprinted gene expression (most of which are clas- 
sified as ICRs). Of the 22 gDMRs (Table 1), 19 have a CTCF and/or 
cohesin binding site in brain within 2.5 kb. Of these sites, 12 are 
bound by both CTCF and cohesin, three by CTCF alone and four 
by cohesin alone. gDMRs with both CTCF and cohesin binding 
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Figure 4. (A) Proportional Venn diagrams comparing coincidence of 
CTCF binding sites between ES cells, liver, and brain demonstrate significant 
overlap of CTCF binding in these tissues, Coincident binding was also con- 
sidered after the removal of binding regions containing the consensus CTCF 
motif; overlap of CTCF binding in the absence of the consensus motif was 
much lower than when all binding sites were considered. (B) The percentages 
of shared peaks for each tissue type for all peaks and for nonmotif peaks. 
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sites within 2 kb formed two categories: those where CTCF and 
cohesin colocalized precisely at the gDMR (eight regions), and 
a further four regions where binding occurred near but not over the 
gDMR and CTCF and cohesin each bound distinct sites (Table 1). 
Where the two factors are precisely colocalized on the DNA, 
cohesin binding is probably linked mechanistically to CTCF, while 
this is less likely at gDMRs where binding sites do not coincide. 

For gDMRs where genome- wide significant (P < 1 X 10~ 6 ) 
parent-of-origin-specific binding events for CTCF (Table 1) were 
detected (H19/Igf2, Mest, Pegl3, and Zim2 [Peg3]), binding occurred 
as expected on the unmethylated allele (Table 1). The Mest and Zim2 
gDMRs were not previously known to bind CTCF in a parent-of- 
origin-specific manner. In addition, the 95% confidence intervals 
(Supplemental Fig. S5; Supplemental Table S2) for parent-of-origin- 
specific binding showed a trend toward preferential binding of 
CTCF to the unmethylated alleles of the GrblO, Mcts2, Cdhl5, 
Nespos, Zrsrl, PeglO, and Meg3/Dlkl gDMRs. We considered CTCF 
binding to be completely biallelic if the 95% confidence interval 
for the maternal-over-paternal read ratio was between 0.35 and 
0.65 and spanned 0.5. This was the case for the Inpp5f_v2 and 
Plagll gDMRs. At H19/Igf2 and Pegl3, parent-of-origin-specific 
binding of cohesin was detected but was not genome-wide sig- 
nificant after Bonferroni multiple testing correction (Table 1). The 
overall pattern of the 95% confidence intervals for the ratio of 
maternal-to-paternal reads for CTCF and cohesin suggests that in 
comparison to CTCF, cohesin binding is less biased toward the 
unmethylated parental allele (Supplemental Fig. S5; Supple- 
mental Table S2). This is consistent with increased recruitment of 
cohesin to sites bound by CTCF. 

Parent-of-origin-specific CTCF and cohesin binding at gDMRs 
could only be tested where a B16-cast SNP is within the bound re- 
gion (Table 1). In addition, CTCF and cohesin peaks did not always 
overlap perfectly so that for some gDMRs, a SNP was informative 
for one factor but not the other. Another limitation for the de- 
tection of parent-of-origin-specific binding arose when a SNP was 
located at the periphery of the respectively bound region where 
fewer reads align and the statistical power of the binomial test was 
diminished. For CTCF, these limitations applied in particular to the 
GrblO, Mcts2, Nnat, Nespos, Zrsrl, Impact, and PeglO gDMRs. For 
cohesin, the above limitations applied to half of the cohesin- 
bound gDMRs (Supplemental Table S2). The results for CTCF 
support the notion that it plays a central role in imprinting control 
at several loci. This is in contrast to cohesin, in particular, four 
gDMRs (Gnas-exonlA, Igf2r-air, Kcnqlotl, and Snurf/Snrpn) were 
bound by cohesin but not by CTCF; here binding was not parental 
allele specific. Cohesin binding independently of CTCF is not 
unprecedented (Schmidt et al. 2012), and there is evidence that it is 
more generally involved in transcriptional activation (Kagey et al. 
2010). 

Genome-wide, parental allele-specific binding of CTCF 
and cohesin is rare and mostly restricted to imprinted loci 

CTCF binding efficiency to methylated DNA is reduced compared 
with unmethylated DNA (Mukhopadhyay et al. 2004) explaining 
parental allele-specific binding at the H19/Igf2 and other gDMRs. If 
CTCF and cohesin are exerting a key regulatory role at several 
imprinted loci, then genome wide, other occurrences of parental 
allele-specific CTCF and/or cohesin binding may identify novel 
DMRs and imprinted genes. Four known ICRs — H19/Igf2, Pegl3, 
Zim2 (Peg3), and Mest— met the genome-wide significance threshold 
for parental allele-specific CTCF binding providing proof of principle. 



Only an additional 17 regions reached genome-wide signifi- 
cance (Fig. 5; Supplemental Fig. S3; Table 2). Eight of these sites 
clustered in a 250-kb region on chromosome 7 at the Pegl2/Magel2 
imprinted domain (Fig. 6). A further four sites were within 6 Mb of 
other known imprinted regions. Two more are 30 kb apart on chro- 
mosome 14 (Fig. 5). Many chromosomes were devoid of parental 
allele-specific CTCF binding, and no cohesin binding regions were 
detected at genome-wide significance (Supplemental Table S3). Of all 
21 genome-wide significant parental allele-specific CTCF binding 
sites, six were on the maternal and 15 on the paternal allele. We tested 
all gene transcripts at or near these sites not previously reported as 
imprinted for parental allele-specific expression in mouse brain. 
Many showed a complex organization of transcripts (Supplemen- 
tal Fig. S6), but none were imprinted (Supplemental Table S4). 

Eight sites of parental allele-specific CTCF binding at the 
Pegl2fMage\2 imprinted domain (Fig. 6) bound CTCF on the pa- 
ternal allele, indicating maternal methylation. We assayed meth- 
ylation of the CpG island at the promoter of Magel2, which is in 
close proximity to two CTCF sites and maternally methylated 
(Supplemental Fig. S7). This is confirmation of the parental allele- 
specific methylation of the region recently reported (Xie et al. 
2012). The Magel2 DMR is likely somatic and established post- 
fertilization that is supported by genome-wide methylation data in 
oocytes (Smallwood et al. 2011). In addition, Dnmt3L~ /+ 8.5 days 
postcoitum (dpc) embryos are unchanged at the Magel2 promoter 
relative to wild type and are unmethylated (Proudhon et al. 2012). 
This suggests that the maternal allele-specific methylation at the 
Magel2 promoter, and presumably the other sites of paternal allele- 
specific CTCF binding in the domain, is established post-implan- 
tation and/or is brain specific. The regulation of the imprinted 
domain comprising Ndn, Magel2, Mkm3, and Pegl2 deserves fur- 
ther study since the human orthologs of Ndn, Magel2, and Mkrn3 
are in the region associated with Prader-Willi syndrome (Lee and 
Wevrick 2000), with patients displaying a notable range of neu- 
rological symptoms. Given this extensive investigation of parental 
allele-specific CTCF binding, we predict that there are few addi- 
tional DMRs in the adult mouse brain bound by CTCF. 

Validation of CTCF and cohesin binding at specific loci 

Quantitative assays for H19/Igf2, PeglO, Nap 115, Nnat, and GrblO 
DMR validated that the ChlP-seq data (Supplemental Fig. S8A,B) 

o Maternal CTCF 




chr: 2 3 6 7 10 13 14 15 

Figure 5. Chromosomal location of genome-wide significant parent- 
of-origin-specific CTCF binding regions. Where CTCF is bound on the 
maternally inherited allele, this is illustrated with a circle; where CTCF is 
bound on the paternally inherited allele, this is illustrated with a square. 
Only chromosomes where parent-of-origin-specific binding was seen are 
shown. CpG density is indicated. 
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Table 2. 21 regions were identified where CTCF binds to one allele in a parent-of-origin-specific manner 
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After correction for multiple testing, regions are ranked in order of statistical significance (P-value). Twelve regions are associated with known imprinted 
genes, of which eight are associated with the Pegl 2/Magel2 imprinted locus. Four further regions occur within close proximity of an imprinted locus. All 
novel candidates were tested for imprinting (Supplemental Table S4). 



results were in agreement (Table 1) with the exception of CTCF 
binding at Nnat and cohesin binding at PeglO. Both are borderline 
cases. Using qPCR to detect CTCF binding at the Nnat DMR 
resulted in P = 0.08, just above our cutoff for binding, At PeglO, 
RAD21 binding was detected by qPCR, but no peak was identified 
by ChlP-seq. When the stringency of the ChlP-seq peak detection 
is relaxed, two RAD21 binding regions ~1 kb either side of the 
qPCR regions are detected (Supplemental Fig. S8C). 

Validation of parental allele-specific binding 

To validate allele-specific binding we pyrosequenced ChlP'd mouse 
brain from reciprocal crosses. We selected three representa- 
tive DMRs, based on the ChlP-seq results: Inpp5f_v2, where 
biallelic CTCF binding was detected; Mest, where we detected pa- 
ternal allele-specific binding; and PeglO, where the CTCF binding 
site did not meet the significance threshold for allele-specific 



binding but where the 95% confidence interval was suggestive of 
CTCF binding on the paternal allele. These results agreed with our 
ChlP-seq data (Supplemental Fig. S9): Inpp5f_v2 does not deviate 
from the expected 50:50 allelic ratio (P = 0.3214), Mest shows pa- 
ternal binding (P = 0.0017), and PeglO shows a bias toward en- 
richment of the paternal allele (P = 0.0813). 

CTCF and cohesin binding at somatic DMRs 

A set of 23 known somatic and novel putative somatic DMR co- 
ordinates has recently been defined by whole-genome bisulfite 
sequencing (BS-seq) in mouse brain (Xie et al. 2012). We evaluated 
CTCF and cohesin binding in the somatic DMRs identified in this 
study (Supplemental Table S5). We found 13 instances of CTCF 
binding, two of which were parental allele specific (P < 1 X 10~ 6 ) 
and 14 instances of cohesin binding, none of which were parental 
allele specific. All parental allele-specific binding involved the 
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Figure 6. Multiple parent-of-origin-specific CTCF binding sites are observed on the paternal allele at the Magel2/Peg1 2 locus. (Triangles) Paternally 
bound CTCF binding sites. Genes and CpG islands are indicated. This region represents a unique example in the mouse genome of CTCF bound only on 
the paternal allele at eight regions in close proximity. This figure was adapted from the UCSC Genome Browser. 
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unmethylated allele. Overall, the results for somatic DMRs (Sup- 
plemental Table S6) are in close agreement with the results for 
gDMRs (Table 1) so that the origin of a DMR, germline versus so- 
matic, is not a determinant of CTCF and/ or cohesin involvement 
in the regulation of imprinting. 

Discussion 

CTCF and cohesin act synergistically and independently 
in mouse brain and show tissue-specific distributions 
compared with undifferentiated cells 

Several studies have examined the colocalization of CTCF with 
cohesin (Chen et al. 2008; Parelho et al. 2008; Rubio et al. 2008; 
Kagey et al. 2010; Schmidt et al. 2012), and CTCF physically asso- 
ciates with cohesin via the Stagl (Scc3/SA1) subunit in human cells 
(Wendt et al. 2008). Here 55% of CTCF binding sites overlap with 
cohesin binding, and the remaining sites binding independently 
(Fig. 2), indicating that CTCF fulfils a role independent from as well 
as in combination with cohesin in brain. This supports the idea that 
different functions may be a result of the context of CTCF binding 
(Gaszner and Felsenfeld 2006), and it is possible that coordinate 
binding of cohesin may influence CTCF. Cohesin is involved in 
tissue-specific transcriptional control (Faure et al. 2012) and asso- 
ciated with the Mediator complex, which has a role in transcrip- 
tional activation (Taatjes 2012). Studies have shown a link between 
cohesin, the Mediator complex, transcription, and chromatin loop- 
ing (Kagey et al. 2010). We report that 51% of cohesin sites in brain 
are not coincident with CTCF, consistent with CTCF not being re- 
quired for the loading of cohesin onto DNA (Rubio et al. 2008). 

This comparison of CTCF binding in ES cells, liver, and brain 
reveals more unique CTCF binding sites in differentiated cells than 
in ES cells, suggesting tissue-specific CTCF binding in the specifi- 
cation and/or maintenance of differentiated tissue. We observe 
a significant overlap in CTCF binding between tissues (Fig. 4A), 
consistent with studies reporting highly conserved CTCF binding 
between cell types (Kim et al. 2007b). 

CTCF binding 

In brain, CTCF binds to unmethylated regions of DNA, usually to 
the canonical CTCF motif (Chen et al. 2008; Schmidt et al. 2010, 
2012). CTCF binding outside this motif is much more tissue spe- 
cific, and there is little overlap between tissues (Fig. 4A). This is 
consistent with evidence from CTCF knockout mouse studies, 
which exhibit embryonic lethality prior to implantation (Splinter 
et al. 2006). We found that the canonical consensus binding motif 
is most frequent at CTCF binding sites shared between ES cells, 
brain, and liver; thus, it is associated with invariant binding during 
differentiation. CTCF binding appears overrepresented just up or 
downstream from gene bodies with a paucity in distal intergenic 
regions, unsurprising given the known role of CTCF in gene ex- 
pression (Bell et al. 1999; Cuddapah et al. 2009). CTCF mediates 
long-range chromosomal interactions genome wide in cis and in 
trans in ES cells (Handoko et al. 2011), and we provide additional 
evidence genome wide that CTCF is an insulator at or near gene 
coding regions by binding to noncoding DNA. 

Cytosine methylation at CTCF binding regions 

Cytosine methylation in both a CpG and non-CpG context is re- 
duced in regions of CTCF binding compared with the level observed 



genome wide, consistent with published data (Mukhopadhyay 
et al. 2004). Interestingly the canonical motif lacks CpG di- 
nucleotides, suggesting that methylation of DNA in the motif does 
not preclude CTCF binding, but surrounding methylation is 
important. The canonical motif may not function alone, but in 
concert with another region of DNA —20 bp downstream, sug- 
gesting that CTCF interaction with DNA is not limited to the 20-mer 
motifs (Schmidt et al. 2012). 

Parent-of-origin-specific CTCF and cohesin binding 

CTCF and cohesin bind at numerous imprinting control regions 
and other DMRs as previously detected, but not systematically 
tested. The presence of CTCF and cohesin together at 12 im- 
printing associated gDMRs in brain (Table 1) is consistent with 
a regulatory role for these proteins at imprinted loci. Studies using 
3C and 4C have shown that several imprinted domains are 
physically clustered (Sandhu et al. 2009), in part because CTCF 
(Botta et al. 2010) and cohesin (Murrell et al. 2004; Nativio et al. 
2009) form loops that contribute to three-dimensional (3D) nu- 
clear architecture (Phillips and Corces 2009). The CTCF and 
cohesin binding (Table 1) supports the idea of three types of im- 
printing mechanisms: CTCF dependent, CTCF/cohesin medi- 
ated, and CTCF/cohesin independent. 

CTCF and cohesin bind at somatic DMRs, suggesting a role 
for them here. Parental allele-specific binding of CTCF together 
with cohesin regulates allele-specific expression in somatic cells 
(Lin et al. 2011), while cohesin binding alone may be involved 
in the transcriptional regulation of imprinted gene expression 
generally. CTCF and cohesin are likely to have distinct functions 
in different cell types at a subset of targets (Lin et al. 2011), and 
findings in mouse brain support the idea that these proteins play 
a role in imprinting at some loci and at others they act more 
generally. 

These data provide a resource for interrogating the roles of 
CTCF and cohesin and point to a role at more imprinted loci than 
was previously appreciated, although further functional studies 
would be needed to confirm this. Three gDMRs do not bind either 
factor, illustrating the heterogeneous nature of gDMRs as a group 
of regulatory regions. For example, the four imprinted retrogene/ 
host gene pairs Mcts2/H13, Napll5/Herc3, Inpp5f/Inpp5f_v2, and 
Zrsrl/Commdl share several sequence-based and genomic con- 
text-related features (Wood et al. 2007). Since within this group, 
Mcts2 binds both CTCF and cohesin together, Napl 15 binds neither 
CTCF nor cohesin, Inpp5f_v2 binds CTCF on both parental alleles 
equally, and Zrsrl binds both CTCF and cohesin, but they do not 
colocalize, this suggests no consistent mechanism for imprinting 
control despite the other shared features. 

CTCF binding profiles vary between different tissues. We 
show that many CTCF binding sites are shared between ES cells 
and differentiated tissues and that this type of invariant CTCF 
binding is associated with the canonical CTCF motif. CTCF bind- 
ing in the absence of the canonical motif is associated with tissue- 
specific CTCF binding. 

Methods 

Chromatin immunoprecipitation 

Chromatin from whole tissue was isolated, sonicated, and immu- 
noprecipitated for ChlP-seq library preparation according to Sup- 
plemental Methods 1. 
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Next-generation sequencing 
Library preparation 

DNA enriched through ChIP was quantified using the Qubit 
(Invitrogen) and Quant-iT dsDNA high-sensitivity assay kit 
(Invitrogen:Q32854) and was sized using the Agilent Bioanalyzer 
with a High Sensitivity DNA Bioanalyzer kit (5067-4626). DNA was 
fragmented to a size appropriate for the library preparation step 
using the Covaris S220, samples were sheared over two cycles: 5% 
duty cycle, 3 intensity, 200 cycles per burst, and time of 65 sec. DNA 
from ChlPs performed on chromatin extracted from two mice was 
pooled. 

ChlP-seq libraries were prepared using the Illumina ChlP-seq 
library preparation kit (IP-102-1001) and the NEBNext ChlP-seq li- 
brary preparation kit (E6240). Libraries were sized and quantified 
using an Agilent Bioanalyzer and a High Sensitivity Kit (5067-4626). 

ChlP-seq data analysis 

Sequence reads were aligned to the mouse reference genome 
(mm9) using Novoalign (v. 2.01.13; http://www.novocraft.com/). 
USeq (Nix et al. 2008) was used to identify mean peak shift sepa- 
rately for CTCF and cohesin reads using only the first of each pair- 
end matched read. Peaks were identified using peak shifts and 
window sizes of 138 bp and 144 bp for CTCF and cohesin, re- 
spectively, and a False Discovery Rate (FDR) of 95% (Supplemental 
Table S2A). A subset of peaks was obtained to a false discovery ratio 
of 5% (Phred-scaled FDR 13) expanded by 500 bp upstream and 
downstream and overlapping peaks merged prior to further 
analysis. Refer to Supplemental Table S2A for the number of raw 
reads that pass a quality control map in a CTCF or cohesin binding 
region. 

Parental allele-specific binding analysis 

Parental allele-specific binding was assessed by binomial testing, us- 
ing a custom bioinformatics pipeline. For performance reasons, only 
reads of interest, which overlapped the previously identified CTCF or 
RAD21 binding sites and a SNP between the two parental strains, 
were extracted from the SAM files and used for subsequent analysis. 

Individual reads were assigned to one of the parental alleles 
using a custom Perl script, using the SAMtools Perl library. Each 
read was mapped as either derived from the reference sequence 
(B16) or from the cast allele on the basis of a SNP between the pa- 
rental strains. If more than one SNP was present, the SNP with the 
best quality of read sequence was used. Reads were only considered 
for subsequent analysis if the Phred-scaled alignment mapping 
quality exceeded 50 and the base call quality at the SNP used for 
mapping of the read exceeded 20. 

Paired reads were mapped to parental strains separately and 
merged. Because paired reads are not independent data points, 
when they were in disagreement (<1%) the read pair was assigned 
on the basis of the best SNP in either of the two reads. 

Assigned reads were converted to maternally or paternally 
derived, and data from both B X C and C X B reciprocal crosses 
were merged for the CTCF and RAD21 data sets independently. 
Counts of maternal and paternal reads were obtained on a per-re- 
gion basis using MySQL. Binding regions were only tested for 
parent-of-origin-specific expression if three or more reads could be 
mapped. 

Parental allele-specific binding was assessed using a two-sided 
binomial test (implemented in R) of the maternal-versus-paternal 
allelic read counts. Regions were sorted by P- value score using 
MySQL. The genome-wide significance of P- values was assessed by 
means of Bonferroni correction. UCSC BED tracks were prepared at 
different cutoffs with maternal/paternal annotation. 



Peak intersections 

All subsequent bioinformatic analyses were performed on expanded 
regions unless otherwise specified. CTCF peak intersections be- 
tween ES cell, brain, and liver data were performed using an opti- 
mized peak size of 1 kb for all data sets. ES cell data were converted to 
mm9 using the UCSC liftOver tool. Peak overlap counts were 
obtained used the BEDTools intersectBed command. For each in- 
tersection, counts of the intersecting peaks were calculated in both 
possible ways, and the peaks count reported was the mean of the 
two measurements. For intersections of more than two data sets, 
only one of the possible configurations of intersections was exam- 
ined; the same configuration was used for all analyses. 

Identification of non-motif-containing peaks 

Peaks that did not contain the CTCF motif were identified using 
the FIMO tool from the MEME suite (Grant et al. 2011). Peak se- 
quences were obtained using the UCSC Genome Browser table tool 
in FASTA format with repeat sequences masked. The CTCF motif 
identified from the brain data set was used throughout, and the 
threshold for detection was set to 10~ 3 . Custom UNIX shell scripts were 
used to extract the coordinates of the peaks from the FIMO output. 

Motif finding 

Motif finding was performed using MEME (Bailey et al. 2009) on 
binding regions using default MEME parameters. For the brain 
data, the best subwindow coordinates were used. 

Genomic distribution of peaks 

The CEAS tool (Shin et al. 2009) was used to assess the genomic 
distribution of unexpanded CTCF binding regions, and unex- 
panded parent-of-origin-specific CTCF binding sites were detected 
with a P < 0.001. Relative abundance was normalized to the pro- 
portion of the genome represented by each genomic region. For 
the CEAS analysis, parent-of-origin-specific CTCF expanded re- 
gions were assigned back to their original constituent unexpanded 
peaks using bedmap (Neph et al. 2012). 

Quantitative PCR validation of CTCF and cohesin ChlP-seq 

These assays are detailed in Supplemental Methods 1. 

Validation of parent-of-origin-specific binding using pyrosequencing 

Chromatin was extracted from four biological replicates, two B X C 
and two C X B P21 brains. Pyrosequencing validated CTCF binding 
at three regions (Supplemental Table S6). ChIP was performed as 
for ChlP-seq. Maternal-to-paternal proportions were assigned us- 
ing SNPs between B16 and cast. Allelic proportions were normal- 
ized to input DNA, which represents a 50:50 ratio of maternal-to- 
paternal reads. Using the normalized maternal proportion, a two- 
sided t-test against a 0.5 null proportion was performed. 

Testing for imprinted expression 

Transcripts were tested for allele-specific expression using PCR fol- 
lowed by Sanger sequencing (using the primers in Supplemental 
Table S7). 

Bisulfite mutagenesis 

Genomic DNA from B X C and C X B intercross mouse brain tissue 
was converted using the Zymo EZ DNA Methylation-Direct Kit 
(D5020). Amplified regions of interest were ligated into pGEM-T 
Easy (Promega:A1360), transformed into competent Escherichia 
coli, and sequenced. Primers were designed with MethPrimer 
(For: GTGTTTGTTG AGAGTTGTTGAG AGA; Rev: ACCAAACAACC 
ATAAAAACCTACAA) (Li et al. 2002). 
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Data access 

Primary sequencing data have been deposited in the NCBI Gene 
Expression Omnibus (GEO; http://www.ncbi.nlm.nih.gov/geo/) 
under accession number GSE35140. 
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